Method for specifying equivalence of language grammars and automatically translating sentences in one language to sentences in another language in a computer environment

ABSTRACT

A method for specifying equivalence of language grammars and automatically translating sentences in one language to sentences in another language in a computer environment. The method uses a unified grammar specification of grammars of different languages in a single unified representation of all the individual grammars where equivalent production rules of each of the grammars are merged into a single unified production rule. This method can be used to represent the equivalence of computer languages like high level language, assembly language and machine language and for translating sentences in any of these languages to another language.

FIELD OF INVENTION

Tile invention relates to a method for specifying equivalence oflanguage grammars and the automatic translation of sentences in onelanguage to sentences in another language in a computer environment.

BACKGROUND OF THE INVENTION

A language is basically a set of sentences that can be formed byfollowing certain rules. The basic building block of any language is itsalphabet. There are numerous languages existing today in the same way.The sentences are a collection of words that are formed from the lettersof the alphabet. There are certain rules to be followed when puttingthese words together. These rules are called grammar of the language andare unique for each and every language. These rules determine the validsentences of the language. Thus one can define grammar as a concisespecification using which, it is possible to generate all the validsentences of the language. A grammar specifies the syntax or structureof a language; irrespective of whether it is a language such as Englishor programming language such as ‘C’ or assembly language.

Very often, it is required to convert sentences in one language toequivalent sentences in another language. For example from English toFrench or from a programming language to assembly language. To performsuch tasks the language grammars have to be specified and the sourcelanguage statements should be validated and translated to sentences inthe target language.

A method used in the prior art for translating a language to anotherlanguage used is carried out in the following manner.

Define the source language grammar. Parse the sentences and convert themto a predefined intermediate format and translate finally, theintermediate format to the target language.

The disadvantages in performing a translation by the above mentionedmethod are the following

-   -   (i) This method will not allow equivalence of the source        language grammar and target language grammar to be specified.        Thus there is no real correspondence between source language        grammar and target language grammar.    -   (ii) Normally this method allows the translation from one source        language to one target language only. Mapping to multiple target        languages will not be possible.    -   (iii) Mapping from a source language to a target language is        predefined and thus supporting translations to new languages        will be difficult.

OBJECT OF THE INVENTION

Bearing in mind the problems and detriments of the prior art, the objectof the present invention is to provide a method to automaticallytranslate sentences from one language to another, overcoming the abovementioned deficiencies.

Thus one of the object of the present invention is to be able to specifythe equivalence of the source language grammar and target languagegrammar.

Another object of the present invention is to allow mapping to multipletarget languages. Method according to the invention should have norestrictions to translating a source language to more than one targetlanguages.

DESCRIPTION OF THE INVENTION

The invention provides a method for representing equivalence of languagegrammars and for the automatic translation of sentences in one languageto sentences in another language in a computer environment.

Let L₁ to L_(n) be n number of languages and G₁ to G_(n) represent therespective grammars for the languages L₁ to L_(n). Each grammar isunique to that particular language. Each grammar G₁ to G_(n) consists ofa set of terminal symbols, a set of non-terminal symbols, a unique startsymbol which is a nonterminal symbol and a set of production rules.These production rules are the main aspects of the grammar. Productionrule define the rules to reduce a string of terminal and/or nonterminalsymbols to a target nonterminal symbol.

In a grammar, there is at least one production rule that has the startsymbol as its target nonterminal symbol. A sentence of a language may bedefined as ally string derived from the start symbol composed of onlyterminal symbols.

In the method according to the invention a unified grammar specificationis created for the grammars G₁ to G_(n) of all the languages L₁ to L_(n)respectively. Then the text in the source language is separated into alist of tokens using conventional lexical analyser for the sourcelanguage. A nonterminal symbol is set to the start symbol of the unifiedgrammar specification. Then a set of grammar production rules isobtained for the said non-terminal symbol form the unified grammarspecifications. Take each symbol one by one from a list of terminalsymbols and/or nonterminal symbols corresponding to the source languagegrammar, determine whether it is a terminal symbol or a nonterminalsymbol. For each terminal symbol obtained which is equivalent to acorresponding symbol in the list of tokens form the source language,consider the next symbol in the list of said terminal symbols and/ornonterminal symbols. For each nonterminal symbol obtained which refersto another non-terminal symbol obtain a set of grammar production rulesfor that nonterminal symbol and repeat the previous steps.

If all the symbols in the said list of terminal symbols and/ornon-terminal symbols corresponding to the source language grammar matchwith symbols in the said list of tokens of the input text obtain a listof symbols corresponding to the target language grammar from the saidunified grammar production rule. For those symbols in the said list ofterminal symbols and/or non-terminal symbols which do not match withsymbols in the said list of tokens, repeat the earlier steps consideringthe next production rule from the set of production rules obtained forthe non-terminal.

Taking each symbol one by one from the said list of symbolscorresponding to the target language grammar, determine whether it is aterminal symbol or non-terminal symbol. Each terminal symbol obtainedare provided as output. For each nonterminal symbol, obtain anotherunified grammar production rule corresponding to that nonterminal symboland repeat this step till all the symbols in the said list of symbolscorresponding to the target language grammar are exhausted.

BRIEF DESCRIPTION OF DRAWINGS

FIG. I shows a system with which the method according to the inventioncan be implemented.

FIG. II shows the flow chart of the method according to the invention.

FIG. III shows the steps taken to create the unified grammarspecification in the second step shown in FIG. II.

FIG. IV shows the steps taken to determine if all symbols in a unifiedgrammar production rule match with the symbols in the token list ‘T’ inthe sixth step of FIG. II and the seventh step of FIG. V.

FIG. V shows the steps taken to determine if a symbol from a unifiedgrammar production rule matches with a symbol from the token list ‘T’ inthe fourth step of FIG. IV.

FIG. VI shows the steps taken to obtain the sentence ‘L_(t)’ from aunified grammar production rule ‘P’ in the eight step of FIG. II and inthe seventh step of FIG. VI.

DESCRIPTION WITH REFERENCE TO THE DRAWINGS

The method according to the invention can be implemented by using aprocessing device (1) such as a microprocessor, a memory (2) and a userinput device (3) connected to said processor (1). The user-input devicemay be a keyboard or any other device which can provide informationsignals to the processor. The memory typically consists of a RAM and aROM. According to the invention, the method of automatic translation ofa sentences from a source language L_(s) selected from a number oflanguages L₁ to L_(n) to a target language L_(t) selected from thenumber of languages L₁ to L_(n) comprises the following steps.

-   -   Step 1: Grammars G₁ to G_(n) of all the languages L₁ to L_(n)        respectively and a text ‘S’ in the source language L, are        provided as inputs.    -   Step 2: A unified grammar specification UG is created for the        grammars G₁ to G_(n).    -   Step 3: The input text ‘S’ in the source language L_(s) is        separated into a list of tokens T using a lexical analyser for        the source language L_(s).    -   Step 4: A nonterminal symbol ‘E’ is set to the start symbol of        the unified grammar specification UG.    -   Step 5: A set of grammar production rules P_(e) is obtained by        selecting the production rules which contain ‘E’ as their target        non-terminal symbol from the unified grammar specification UG.    -   Step 6: For each unified grammar production rule P in the set of        grammar production rules P_(e) taking each symbol one by one        from a list of terminal symbols and/or non-terminal symbols        corresponding to the source language grammar G_(s), determine        whether it is a terminal symbol or a non-terminal symbol.    -   Step 7: For each terminal symbol obtained from the previous step        which is equivalent to a corresponding symbol in the list of        tokens T of the input text in the source language L_(s),        consider the next symbol in said list of terminal symbols and/or        nonterminal symbols corresponding to the source language grammar        G_(s) and for each nonterminal symbol obtained from the previous        step which refers to another nonterminal symbol E_(s), of the        unified grammar specification UG, repeat step (5) onwards with        the new nonterminal E_(s).    -   Step 8: If all the symbols in the said list of terminal symbols        and/or non-terminal symbols corresponding to the source language        grammar G_(s) match with all the symbols in the list of tokens T        of the input text in the source language L_(s), obtain a list of        symbols t corresponding to the target language grammar G_(t)        from the unified grammar production rule P and for those symbols        which do not match, repeat step 6 onwards for the next unified        grammar production rule P defined for the nonterminal symbol        ‘E’.    -   Step 9: Take each symbol one by one, from the list of symbols t        corresponding to the target grammar G_(t) and determine whether        it is a terminal symbol or a non-terminal symbol.    -   Step 10: For each terminal symbol obtained from the previous        step output the symbol, and consider the next symbol and for        each nonterminal symbol obtained from the previous step, obtain        another unified grammar production rule P corresponding to that        nonterminal symbol and repeat the previous step with the new        unified grammar production rule, till all the symbols in the        list of symbols t corresponding to the target language grammar        G_(t) are exhausted.

The unified grammar specification UG, for the grammars G₁ to G_(n) oflanguages L₁ to L_(n), is created by defining a unified production ruleP₁ in the unified grammar specification UG having the target nonterminalsymbol of the production rule P as its target nonterminal symbol forevery production rule P of the grammars G₁ to G_(n) and creating a listof terminal symbols and/or nonterminal symbols in the said productionrule P₁ for each grammar G₁ to G_(n); adding each and every symbol inthe list of terminal and/or nonterminal symbols that are represented bythe target nonterminal symbol in the production rule P to the saidunified production rule P₁ and repeating previous steps for the nextproduction rule of the grammars G₁ to G_(n).

The method according to the invention can be used to represent theequivalence of multiple language grammars and for translating sentencesof one language to another.

1. A method of automatic translation of sentences from a source languageL_(s) selected from language L₁ to L_(n) to a target language L_(t)selected from languages L₁ to L_(n) comprising the steps of: (i)providing grammars G₁ to G_(n) of all the languages L₁ to L_(n)respectively and a text ‘S’ in the source language L_(s) as inputs; (ii)creating a unified grammar specification UG for the grammars G₁ toG_(n); (iii) separating the input text ‘S’ in the source language L_(s)into a list of tokens using a lexical analyser for the source languageL_(s); (iv) setting a non-terminal symbol ‘E’ to the start symbol of theunified grammar specification UG;. (v) obtaining a set of grammarproduction rules P_(e) which define the rules to reduce a string ofterminal symbols and/or non-terminal symbols to the target non-terminalsymbol E from the unified grammar specification UG; (vi) for eachunified grammar production rule P in the set of grammar production rulesP_(e) taking each symbol one by one from a list of terminal symbolsand/or non-terminal symbols corresponding to the source language grammarG_(s), determining whether it is a terminal symbol or a non-terminalsymbol; (vii) for each terminal symbol obtained from the previous step,which is equivalent to a corresponding symbol in the list of tokens T ofthe input text in the source language L_(s), considering the next symbolin said list of terminal symbols and/or non-terminal symbolscorresponding to the source language grammar G_(s) and for eachnon-terminal symbol obtained from the previous step which refers toanother non-terminal symbol E_(s), of the unified grammar specificationUG, repeating step (v) onwards with the new non-terminal symbol E_(s);(viii) if all the symbols in the said list of terminal symbols and/ornon-terminal symbols corresponding to the source language grammar G_(s)match with all the symbols in the list of tokens T of the input text inthe source language L_(s), obtaining a list of symbols t correspondingto the target language grammar G_(t) from the unified grammar productionrule P and for those symbols which do not match, repeating step (vi)onwards for the next unified grammar production rule P defined for thenon-terminal symbol ‘E’; (ix) taking each symbol one by one, from thelist of symbols t corresponding to the target grammar G_(t) anddetermining whether it is a terminal symbol or a non-terminal symbol;(x) for each terminal symbol obtained from the previous step outputingthe symbol, and considering the next symbol and for each non-terminalobtained from the previous step, obtaining another unified grammarproduction rule P corresponding to that non-terminal symbol andrepeating the previous step with the new unified grammar productionrule, till all the symbols in the list of symbols t corresponding to thetarget language grammar G_(t) are exhausted,
 2. The method as claimed inclaim 1, wherein the unified grammar specification UG, for the grammarsG₁ to G_(n), of languages L₁ to L_(n), is created by the steps of: (i)for every production rule P of the grammars G₁ to G_(n) of the languagesL₁ to L_(n), defining a unified production rule P₁ in the unifiedgrammar specification UG having the target non-terminal symbol of theproduction rule P as its target non-terminal symbol; and (ii) for eachgrammar G₁ to G_(n) creating a list of terminal symbols and/ornon-terminal symbols in the said production rule P₁ and adding each andevery symbol in the list of terminal symbols and/or non-terminal symbolsthat are represented by the target non-terminal symbol in the productionrule P to the said unified production rule P₁ and repeating previousstep for the next production rule of the grammars G₁ to G_(n).